Object Detection Algorithm with Dual-Modal Rectification Fusion Based on Self-Guided Attention
ZHANG Jinglei1,2,3, GONG Wenhao1,2, JIA Xin3
1. Tianjin Key Laboratory for Control Theory and Applications in Complicated Industry Systems, Tianjin University of Technology, Tianjin 300384; 2. School of Electrical Engineering and Automation, Tianjin University of Technology, Tianjin 300384; 3. Engineering Training Center, Tianjin University of Technology, Tianjin 300384
Abstract:The traditional dual-modal object detection algorithms struggle to overcome low-contrast noise in complex scenes, such as fog, glare and dark night, and they cannot recognize small-size objects effectively. To solve these problems, an object detection algorithm with dual-modal rectification fusion based on self-guided attention is proposed. Firstly, a dual-modal fusion network is designed to rectify the low-contrast noise in the input images(visible and infrared images) by channel and spatial feature rectification. Consequently, the complementary information is acquired from the rectified features to accurately achieve feature fusion and the detection accuracy of the algorithm in the complex scenes is improved. Secondly, a self-guided attention mechanism is established to learn the dependency among pixels in the images. Thus, the fusion capability of features at different scales and the detection accuracy of the algorithm for small-scale objects are improved. Extensive experiments on six datasets, including pedestrian datasets, pedestrian-vehicle datasets and aerial vehicle datasets, demonstrate the superiority of the proposed approach.
[1] TIAN Z, SHEN C H, CHEN H, et al. FCOS: Fully Convolutional One-Stage Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 9626-9635. [2] REDMON J, FARHAD A.YOLO9000: Better, Faster, Stronger // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6517-6525. [3] ZHOU K L, CHEN L S, CAO X.Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 787-803. [4] KIEU M, BAGDANOV A D, BERTINI M, et al. Task-Conditioned Domain Adaptation for Pedestrian Detection in Thermal Imagery // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 546-562. [5] ZHANG H, FROMONT E, LEFEVRE S, et al. Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2020: 276-280. [6] ZHANG H, FROMONT E, LEFEVRE S, et al. Guided Attentive Feature Fusion for Multispectral Pedestrian Detection // Proc of the IEEE Winter Conference on Applications of Computer Vision. Wa-shington, USA: IEEE, 2021: 72-80. [7] ZHANG H, FROMONT E, LEFEVRE S, et al. Deep Active Lear-ning from Multispectral Data Through Cross-Modality Prediction Inconsistency // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2021: 449-453. [8] AN Z J, LIU C L, HAN Y Q.Effectiveness Guided Cross-Modal Information Sharing for Aligned RGB-T Object Detection. IEEE Signal Processing Letters, 2022, 29: 2562-2566. [9] SUN Y M, CAO B, ZHU P F, et al. Drone-Based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6700-6713. [10] YUAN M X, WANG Y Y, WEI X X.Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 509-525. [11] 孙颖,侯志强,杨晨,等.基于双模态融合网络的目标检测算法.光子学报, 2023, 52(1): 203-215. (SUN Y, HOU Z Q, YANG C, et al. Object Detection Algorithm Based on Dual-Modal Fusion Network. Acta Photonica Sinica, 2023, 52(1): 203-215.) [12] ZHANG J Q, LEI J, XIE W Y, et al. SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61. DOI: 10.1109/TGRS.2023.3258666. [13] FANG Q Y, HAN D P, WANG Z K.Cross-Modality Fusion Transformer for Multispectral Object Detection[C/OL]. [2023-06-23]. https://arxiv.org/abs/2111.00273v. [14] ZHAO Z X, BAI H W, ZHANG J S, et al. CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-modality Image Fusion // Proc of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 5906-5916. [15] ZHU Y H, SUN X Y, WANG M, et al. Multi-modal Feature Pyramid Transformer for RGB-Infrared Object Detection. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(9): 9984-9995. [16] SHAO Y H, HUANG Q M, MEI Y Y, et al. MOD-YOLO: Multispectral Object Detection Based on Transformer Dual-Stream[C/OL]. [2023-06-23]. http://dx.doi.org/10.2139/ssrn.4469854. [17] BAO C, CAO J, HAO Q, et al. Dual-YOLO Architecture from Infrared and Visible Images for Object Detection. Sensors, 2023, 23. DOI: 10.33901S3062934. [18] FU H L, WANG S X, DUAN P H, et al. LRAF-Net: Long-Range Attention Fusion Network for Visible-Infrared Object Detection. IEEE Transactions on Neural Networks and Learning Systems, 2023. DOI: 10.1109/TNNLS.2023.3266452. [19] LIU Z, LIN Y Y, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9992-10002. [20] TAN M X, PANG R M, LE Q V.EfficientDet: Scalable and Efficient Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10778-10787. [21] ZHANG J M, LIU H Y, YANG K L, et al. CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers. IEEE Transactions on Intelligent Transportation Systems, 2023. DOI: 10.1109/TITS.2023.3300537. [22] VADIDAR M, KARIMINEZHAD A, MAYR C, et al. Robust Environment Perception for Automated Driving: A Unified Learning Pipeline for Visual-Infrared Object Detection // Proc of the IEEE Intelligent Vehicles Symposium. Washington, USA: IEEE, 2022:367-374. [23] CHOI Y, KIM N, HWANG S, et al. KAIST Multi-spectral Day/Night Data Set for Autonomous and Assisted Driving. IEEE Tran-sactions on Intelligent Transportation Systems, 2018, 19(3): 934-948. [24] JIA X Y, ZHU C, LI M Z, et al. LLVIP: A Visible-Infrared Paired Dataset for Low-Light Vision// Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 3489-3497. [25] RAZAKARIVONY S, JURIE F.Vehicle Detection in Aerial Ima-gery: A Small Target Detection Benchmark. Journal of Visual Communication and Image Representation, 2016, 34: 187-203. [26] ZHANG L, LIU Z Y, ZHU X Y, et al. Weakly Aligned Feature Fusion for Multimodal Object Detection. IEEE Transactions on Neu-ral Networks and Learning Systems, 2021. DOI: 10.1109/TNNLS.2021.3105143.